Performance Comparison of Gmm, Hmm and Dnn Based Approaches for Acoustic Event Detection within Task 3 of the Dcase 2016 Challenge
نویسندگان
چکیده
This contribution reports on the performance of systems for polyphonic acoustic event detection (AED) compared within the framework of the “detection and classification of acoustic scenes and events 2016” (DCASE’16) challenge. State-of-the-art Gaussian mixture model (GMM) and GMM-hidden Markov model (HMM) approaches are applied using Mel-frequency cepstral coefficients (MFCCs) and Gabor filterbank (GFB) features and a non-negative matrix factorization (NMF) based system. Furthermore, tandem and hybrid deep neural network (DNN)-HMM systems are adopted. All HMM systems that usually are of single label type, i.e., systems that only output one label per time segment from a set of possible classes, are extended to multi label classification systems that are a compound of single binary classifiers classifying between target and non-target classes and, thus, are capable of multi labeling. These systems are evaluated for the data of residential areas of Task 3 from the DCASE’16 challenge. It is shown that the DNN based system performs worse than the traditional systems for this task. Best results are achieved using GFB features in combination with a single label GMM-HMM approach.
منابع مشابه
Deep Neural Network Baseline for Dcase Challenge 2016
The DCASE Challenge 2016 contains tasks for Acoustic Scene Classification (ASC), Acoustic Event Detection (AED), and audio tagging. Since 2006, Deep Neural Networks (DNNs) have been widely applied to computer visions, speech recognition and natural language processing tasks. In this paper, we provide DNN baselines for the DCASE Challenge 2016. In Task 1 we obtained accuracy of 81.0% using Mel +...
متن کاملA Comparison of deep learning methods for environmental sound
Environmental sound detection is a challenging application of machine learning because of the noisy nature of the signal, and the small amount of (labeled) data that is typically available. This work thus presents a comparison of several state-of-the-art Deep Learning models on the IEEE challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 challenge task and data,...
متن کاملFully DNN-based Multi-label regression for audio tagging
Acoustic event detection for content analysis in most cases relies on a lot of labeled data. However, manually annotating data is a time-consuming task, which thus makes few annotated resources available so far. Unlike audio event detection, automatic audio tagging, a multi-label acoustic event classification task, only relies on weakly labeled data. This is highly desirable to some practical a...
متن کاملHierarchical learning for DNN-based acoustic scene classification
In this paper, we present a deep neural network (DNN)-based acoustic scene classification framework. Two hierarchical learning methods are proposed to improve the DNN baseline performance by incorporating the hierarchical taxonomy information of environmental sounds. Firstly, the parameters of the DNN are initialized by the proposed hierarchical pre-training. Multi-level objective function is t...
متن کاملSound Event Detection for Real Life Audio DCASE Challenge
We explore logistic regression classifier (LogReg) and deep neural network (DNN) on the DCASE 2016 Challenge for task 3, i.e., sound event detection in real life audio. Our models use the Mel Frequency Cepstral Coefficients (MFCCs) and their deltas and accelerations as detection features. The error rate metric favors the simple logistic regression model with high activation threshold on both se...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016